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Abstract 

How to recognise whether an observed person walks or runs? We consider a dy- 
namic environment where observations (e.g. the posture of a person) are caused 
by different dynamic processes (walking or running) which are active one at a time 
and which may transition from one to another at any time. For this setup, switch- 
ing dynamic models have been suggested previously, mostly, for linear and non- 
linear dynamics in discrete time. Motivated by basic principles of computations 
in the brain (dynamic, internal models) we suggest a model for switching non- 
linear differential equations. The switching process in the model is implemented 
by a Hopfield network and we use parametric dynamic movement primitives to 
represent arbitrary rhythmic motions. The model generates observed dynamics by 
linearly interpolating the primitives weighted by the switching variables and it is 
constructed such that standard filtering algorithms can be applied. In two experi- 
ments with synthetic planar motion and a human motion capture data set we show 
that inference with the unscented Kalman filter can successfully discriminate sev- 
eral dynamic processes online. 



1 Introduction 

Humans are extremely accurate when discriminating rapidly between different dynamic processes 
occurring in their environment. For example, for us it is a simple task to recognise whether an 
observed person is walking or running and we can use subtle cues in the structural and dynamic 
pattern of an observed movement to identify emotional state, gender and intent of a person. It has 
been suggested that this remarkable perceptual performance is based on a learnt, generative model 
of the dynamic processes in the environment which the brain uses to infer the current state of the 
environment given sensory input [1 1. Motivated by this view we propose a generative model based 
on continuous-time dynamics which models different dynamic processes in the environment by 
switching between different nonlinear differential equations. Using online inference in this model 
we can rapidly and accurately discriminate between different dynamic processes, e.g., motions. 

In the model, a Hopfield network [2| implements the dynamics between switching variables (the 
"switching dynamics"). The Hopfield dynamics implements a winner-take-all mechanism between 
arbitrary many switching variables such that only one of the switching variables is active in each 
stable fixed point of the dynamics. We associate each switching variable with different parameters 
of a parametric differential equation implemented by a dynamic movement primitive (DMP) (3] |4j. 
The parameters are then interpolated based on the continuous values of the switching variables and 
the resulting differential equation (the observation dynamics) is used to generate observations. 

Exact inference in the nonlinear, continuous-time, hierarchical model is intractable. Here, we show 
that a standard filtering procedure (the unscented Kalman filter) enables efficient, robust and, impor- 
tantly, online discrimination of dynamic processes. We illustrate these features using experiments 
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with (i) simple nonlinear movements in a plane and (ii) motion capture data of a human walking in 
different styles. 



1.1 Related Work 

Switching dynamic models are well-established in statistics J5j [6J, signal processing [7| and ma- 
chine learning [8 9 10 1. In contrast to these models, we define both, the dynamic models and the 
switching variables, using nonlinear dynamical systems with continuous states running in continu- 
ous time. This allows us to link our model more easily to computations implemented in analogue 
biological substrate such as the brain. Additionally, a formulation in continuous time allows us to 
easily perform time-rescaling of dynamical systems 0. 

More recently, other continuous-time switched dynamic models have been proposed, for example, 
nonparametric models lfTTl[T2l which extend Bayesian online change point detection lfl~3l IT4l using 
Gaussian processes. Although online inference methods for these models have been described, 
their aim is not to identify a known dynamic process, but rather to make accurate predictions of 
observations across change points at which the underlying dynamic process changes. Similarly, 
switched latent force models lfT5ll are nonparametric models in which the position of change points 
and the underlying dynamic processes are modelled using Gaussian processes and DMPs. The 
proposed inference method is offline, i.e., all observed data are used and again the aim is not to 
discriminate between different, previously learnt models. Rather, this approach could be used to 
learn parametric models based on the obtained change point posterior. 

In fl6L the authors derive a smoothing algorithm based on variational inference for an Ornstein- 
Uhlenbeck (OU) process which is switched by a random telegraph process. Thus, this model can 
only switch between two constant drifts. Similarly, ifTTl [TSl propose Markov chain Monte Carlo 
inference for a switched OU process where the number of different parameter sets as well as the pa- 
rameter values are automatically determined from the data. However, these parameters are limited 
to the constant drift and diffusion parameters of the OU process which cannot implement generic 
nonlinear dynamics. In contrast to these models, we approximate a change point process using our 
continuous-valued switching dynamics which allows us to maintain a coherent continuous frame- 
work and apply standard filtering algorithms. 

In the following we describe the present model and the proposed inference method in detail and 
subsequently present results of our experiments in Section [5] 



2 Switching Dynamics 



We want the switching dynamics to be able to form a stable representation of the identity of a dy- 
namical system. We implement this requirement with a Hopfield network [2] which defines winner- 
take-all dynamics using lateral inhibition between units in a fully connected network. In particular, 
the network is defined as 

z = k (Mer(z) + & lin (.gl - z)) + w (1) 

where z £ M. N are the state variables (one for each of the N dynamical systems in the observation 
dynamics), k is a rate constant, er is a multidimensional logistic sigmoid function, b lm is a parameter 
determining the strength with which the states tend to converge to the goal g, 1 £ M. N is a vector 
of Is, M e M A ' xAr is a connection matrix and w <E M. N is external input to the dynamical system. 
Note that we will use these external inputs w to induce the actual switching behaviour (see below). 
Lateral inhibition for winner-take-all dynamics is implemented using, 

<Ti(z) =<r(zi) = — and M = 6 lat (-1+I) (2) 

1 + exp(-r[Zi - o)) 

where now r determines the slope and o the centre of a, 6 lat is a parameter determining the strength 
of lateral inhibition, 1 £ R NxN is a matrix of Is and I is the identity matrix. The dynamics in eq. 
([TJ has two parts: 1) the lateral inhibition Mcr(z) and 2) the linear term & lln (<7l — z) which attracts 
the state z to gl. Note that the fixed points of the system are implicitly given as 

z * = fl-prZ) CT (^) Vie l,. ..,tv. (3) 
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One can see that the fixed points with one state z m k, g while all others are Zj^ m w are local 
minima of the underlying Lyapunov function and therefore stable [2| provided that o — g and 
& lat /6 lm = 2g. In the experiments below we chose g — 10, fe lat = 1.7, k = 4, r = 1 and the 
remaining parameters according to the given constraints. 

We define the output of the switching dynamics as 

V = <7» (4) 

such that the elements of v e Mr lie between and 1 . The output v is then used as the weights 
of a linear interpolation of the observation dynamics parameters: 9 = ©v, where 9 € K M are the 
parameters currently used at the observation level and € M MxAr are the N sets of parameters 
of the N dynamical systems in the observation dynamics. By setting the limits of v to and 1 
we, therefore, ensure that only one set of parameters is active for each stable fixed point of the 
switching dynamics. We here use the logistic sigmoid function (eq. |2|i with o — g/2 and r = 0.7. 
We chose r = 0.7 to increase the region within the interval [0,g] for which the logistic function is 
approximately linear while still maintaining cr(0) s» and a(g) f=s 1. This setting eases inference of 
z later on. 

Evolving the switching dynamics leads to the following behaviour: z converges to the fixed point 
Zi R3 g, Zi^Lj ps (fixed point i) where i is the index of the state which was the largest at t = 0. 
In the experiments below, when generating data, we induce switching by applying an external input 
Wk = 4 for a short period of time in order to switch from fixed point i to fixed point k. 



3 Representing Dynamical Systems with Dynamic Movement Primitives 

In the observation dynamics we use a simplified form of rhythmic dynamic movement primitives 
(DMPs) J3] E) to represent rhythmic dynamical systems. By using DMPs we gain the following 
properties: 

1 . arbitrary, rhythmic motion is easily learnt 

2. different DMPs describing different movements have a common parameterisation 

3. learnt dynamical systems represent a strongly attracting limit cycle which improves dis- 
criminability during inference 

We define a DMP as 

s = a (f (lj, v) - s) I 1 k(w) 

with phase variable ui and position variables s. The phase u is governed by a constant drift which 
implements the motion around the limit cycle. Its frequency f u (v) / (2%) depends on the switch- 
ing dynamics through v (see Section [2J. The position variables s g R D describe the actual mo- 
tion through D-dimensional space. For fixed phase they implement a simple linear point attractor 
at f(w,v) where a > determines the strength of attraction. Thus, for drifting phase the po- 
sition variables follow f(cj,v) such that the complete dynamical system implements a limit cy- 
cle at the positions defined by f(w,v). We choose f(w,v) to be a linear combination of (nor- 
malised) basis functions k(cj). In particular, we use circular von Mises basis functions as in J4): 
kj(uj) — exp ((cos(w — Cj) — l)/i 2 ) with width £ > which reach their maximum value of 1 at 
centres cj . 

In the experiments below we fixed a = 1, which is sufficiently large such that the states quickly 
return to the limit cycle when perturbed, but other values work similarly well. We determined Cj and 
£ from the number of basis functions C as follows: We evenly distributed the centres Cj in [0, 2tt] 
and set £ = 2n/C. We chose the frequency f u for each data set by an ad-hoc estimate of the cycle 
period described by the data. More sophisticated methods are available 1191 . but were not required 
for the demonstration considered here. Finally, we leamt the weights of the basis functions W from 
given positions x(tj), i = 1, . . . , T by computing k(w) from the phases w(ti J] and applied simple 
least squares fitting to the resulting data set as suggested in |[3] |4j . 



'By default we set u(ti) = for all data sets. 
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Furthermore, we allow for an affine transformation of the position variables which produces the 
actual observations of the model: y = As + b. 

4 Online Inference for Switching Dynamical Systems 

The dynamical systems described in Sections|2]and|3]are the basis for a hierarchically structured gen- 
erative model. We combine all dynamic states into a single vector x such that x = [w T , z T , w, s T ] T . 
By assuming Gaussian noise on the variables in the model we obtain the following stochastic differ- 
ential equation 

— w 

fc(M<T(z) + 6 lin ( ff l-z))+w 

a(f(cj,v)-s) 
y = As + b + e y 

where v = cr v (z), de x represents a Wiener process, B = diag(r/) with T] — [rj^, r/^ , r) u , rf^] T 
being the standard deviations for each variable such that Q = BB T is the corresponding covariance 
matrix. The Gaussian measurement noise e y has covariance matrix R =diag(r^). We model exter- 
nal inputs to the Hopfield network w with an Ornstein-Uhlenbeck process which fluctuates around 
0. Although this process does not generate brief, switch-inducing inputs (as e.g. expected when 
switching gaits), we will show below that it provides sufficient flexibility such that switches can be 
reliably inferred from the data. 

Our aim is to infer the state z(t) of the switching dynamics based on the history of observed data 
y( T ), t < t which is input to the observation dynamics. In general, this is a filtering problem which 
could be solved optimally with the Kalman-Bucy-Filter [20|, if the dynamical systems were linear. 
For our nonlinear dynamical systems we have to resort to approximate filtering methods such as the 
extended GUI or unscented Kalman filters (UKF) ||2T1 l22l . or particle filters [23 1. Here, we used 
the UKF because of its balanced trade-off between computational efficiency and nonlinear filtering 
performance. While a continuous-time version of the UKF has been suggested ll24l we found this 
to be numerically unstable for the present filtering problem and instead we used the standard dis- 
crete UKF with the following approximation in the prediction step: we numerically integrated the 
deterministic part of eq. |6]l starting from the current sigma points of the UKF to approximate the 
nonlinearly transformed sigma points and then added AtQ to the estimated posterior covariance 
instead of Q (cf. eq. 19 in E4l0 . Here, At is the time between the last and the new data point and 
the factor of At corresponds to the variance of noise which, prescribed by the Wiener process, has 
been accumulating within At. We used standard settings of UKF parameters as reported in |22l . i.e., 
we chose a — 0.01, (3 — 2 and k = 3 — L where L is the dimensionality of x, L = 2N + D + 1. 

The UKF requires prior choices for the covariance matrices of the states and observations, Q and R. 
Below we used the same Q for inference on both data sets where we set r] w = l,r] z = 0.5, = 
0.05 and r/ s — 0.1 (the scalars are expanded to all dimensions of the corresponding vectors). These 
values have been manually selected to facilitate switching during inference and embody a high level 
of prior uncertainty about the hidden states. Smaller values for r\ may be chosen (up to two orders 
of magnitude) with only moderate loss in discrimination performance. However, we found that r\ w 
should be kept high as otherwise the UKF will be more likely to miss a switch. The choice for R is 
experiment-specific and is described below. 

5 Experiments 

To provide a proof of concept, we modelled (i) synthetic data of planar motion and (ii) human 
motion capture data. The experiments below show that UKF filtering can discriminate between 
different motions. To measure performance we filtered a long stream of data consisting of a large 
number of trials between which the observed motion switched. The goal was to identify, as fast as 
possible, the DMP that caused the data, i.e., to infer switches between DMPs accurately. We report 
the fraction of correct model responses (or % correct of the total number of trials) where we define 
the system response as the identity of the maximal switching variable after a chosen number of time 
steps from the beginning of a trial: arg max^ Zi(n x At). 



dt + Bde, 



(6) 
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Table 1: All experiments: Fraction of correct responses as function of observation time. For 2D 
ellipses At is arbitrarily chosen. For walks At corresponds to 33ms in real time. Even for misspec- 
ified models (observation noise in model, T] y = 0.01, up to two orders of magnitudes smaller than 
in data), the fraction of correct responses remains high for noisy input. 
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5.1 2D dots 

We first illustrate inference in our model using a simple example of synthetic motions of dots which 
followed different trajectories on a plane. The chosen trajectories formed three different ellipses 



and a figure-8 and were partially overlapping as depicted in Fig. 1(a) The observations were the 
2D-positions of two phase-shifted (by it) dots. Thus, the online inference task was to determine by 
which DMP the motion of the two dots was generated. The experimental procedure was as follows: 
1) learn the four DMPs from example data, 2) generate dynamically switched data from learnt DMPs 
(25 trials for each DMP) and 3) infer hidden variables from generated data. 

We set the size of time steps and frequency of oscillation arbitrarily such that a full period of the 
oscillation was reached after 50 time steps. We used C = 7 basis functions and the identity function 
as output function, i.e., A = I and b = 0, i.e. the dots directly plot the on-going dynamics. Hence, 
the number of observations was equal to the dimensionality of the position variables s, D = 4. 

For generating data from the switching model we set all prior uncertainties to 0.001 (negligibly low 
noise) unless specified otherwise below. We then generated 25 trials for each of the four motions 
in the data set by simulating 50-25-4 time steps while randomly switching to a different than the 
current fixed point attractor every 50 time steps using the external input to the Hopfield dynamics 
w. 

For online inference based on eq. |6} and the generated stream of data we initialised the dynamic 
variables in the model with random values. In particular, we started the switching dynamics ran- 
domly in one of the fixed points, set the position variables s in the observation dynamics to the 
0-position on the trajectory of the chosen attractor and chose the phase variable u> randomly (uni- 
formly) in the interval [— 7r/4, 7r/4]. Note that the initialisation is only relevant for the very first trial 
of each experiment as we define the beginning of all other trials as the time point of a switching 
impulse in w. Further, we set the prior uncertainty of the observations to the true value T] y = 0.001. 

For these data the model quickly switched into the correct attractor of the switching dynamics in all 
trials resulting in 100% correct responses after less than 15 observations into each trial, i.e., each 
DMP was correctly identified after less than observing 30% of a cycle (cf. Table [T] and Fig. 2(a) for 
a typical example). 

We tested the robustness against noise by generating movement sequences, but with increased obser- 
vation noise r] y . For our choice of r\ y — 0.2 the random steps in the plane introduced by noise were 
on average more than twice as big as the steps introduced by the movements themselves, i.e., at any 
single point in time random velocities masked the velocities prescribed by the movements (see Fig. 



1(b) I. Nevertheless, dynamic inference still performed at almost 100% correct responses but needed 
longer into a movement cycle to gain high performance, as compared to the noise-free case. Even 
when the prior uncertainty was set to an inappropriately low value (r] y — 0.01), the performance 
stayed above 90% (cf. Table [T} after having observed half the cycle. 

5.2 Human Walks 

The second experiment is aimed at showing that the present model can in principle also be applied 
to complicated real-world motions. For this purpose we chose a motion capture data set of a human 
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(a) noise-free trajectories (b) noisy trajectories (c) original 



(d) noisy 



Figure 1: Illustration of the used motions, (a) Trajectories on which the two observed dots move for 
the four synthetic planar motions. Motion direction indicated by arrows, (b) One example period 
of each motion of (a) perturbed by observation noise with r\ y = 0.2. (c) A typical motion capture 
posture (one frame of a walk). Observed data are the 3D positions of the dots (connecting lines 
shown for illustration purposes only), (d) Posture shown in (c) perturbed by noise as used in our 
experiment. 




Figure 2: Discriminating four planar motions, (a) Inference results for four example trials on noise- 
free data. The observed motion switches every 50 time steps. Shown are (from left to right): (i) 
the DMP positions s (grey) and phase lu/(2tt) (red, rising), (ii) the switching output v, (iii) switch- 
ing variables z, and (iv) the external input w. Dotted lines show the prediction errors (difference 
between predicted states and inferred states after UKF update). Grey shading indicates two times 
the posterior standard deviation of the corresponding variables (for switching output determined via 
unscented transform from switching variables). The true sequence of movements in terms of colours 
is light blue-green-yellow-red which is found by the model after a short transient period at the be- 
ginning of each trial (as determined by finding the switching variable with the highest value), (b) 
Same as (a), but for noisy movements. The inferred states become more noisy, but movements can 
still be reliably discriminated. 
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Figure 3: Inference results for five example trials of the motion capture walks. Format of the figure 
as in Fig. [2 (a,b). (a) Results for original, concatenated motion capture data, (b) Results for same 
data with added independent noise. In all five trials the model identified the correct walk (yellow - 
green - dark blue - light blue - red). From seconds 85 and 94 in (a,b) the model transiently found an 
alternative explanation for the data (e.g. for second 85 in (a) a linear combination of DMPs), but it 
switched into the correct attractor of the switching dynamics after at most ca. Is of real time (ca. 30 
motion capture frames) in all shown examples. 



walking in four different, but similar style^] As our focus was on the switching model rather than 
on presenting a complete model for motion capture data, we preprocessed the walks in the following 
way: To focus on the important aspects of the walk dynamics we removed the global translation of 
the body. We then computed the position of each motion capture marker in 3D space resulting in 



a data set of 30 dots moving in 3D space (see Fig. 1(c) I. We selected a subset of captured frames 
which roughly contained one period of each walk and which spanned 4s of real time. Because the 
dynamics of the 30 dots were highly correlated across walks, we performed principle component 
analysis (PCA) on the data of all walks such that all walks were represented in a common 6D space 
(capturing ca. 98% of the original variance) and normalised the data in this space to mean and 
maximum 1 in each dimension. We estimated the period of each walk by estimating when the 
trajectory of a walk came closest to finishing a cycle in the normalised space, resulting in 3.17s, 
3.8s, 3.03s and 4s for the four walks. We used these periods to estimate the f u and we learnt four 
corresponding DMPs in the normalised PCA space (using C = 12 basis functions). In addition, we 
introduced another, trivial DMP with W = to represent constant input which is used to infer the 
absence of motion, e.g., when the walker stands still. 

We integrated these five DMPs into the generative model, where the resulting number of switching 
variables z was N = 5 and the number of position variables s was D = 6. Further, we set the linear 
output transformation of the observation layer (A and b) such that it implemented the mapping from 
normalised PCA space to the original 3D marker positions. Therefore, the observations in the model 
were the 90-dimensional marker position vectors. 

To render the inference performed by the system more challenging, we tested the model on the 
original data (as opposed to data generated by the generative model) where we switched between 
the four walks (and a fifth standstill walker to test switching to the trivial DMP) every 3s. This harsh 



motions 1,5, 15 and 19 of subject 142 of the CMU motion capture database jhttp : //mocap ■ cs ■ emu". 



edu/ I corresponding to childish, depressed, sad and shy walks, respectively 
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switching regime introduced jumps in the underlying phase of the walking cycle in addition to the 
transition to a different walk. These data had a total length of 150s, consisting of 50 3s walks. 

For inference we set the prior uncertainty of the observations to a low rj y — 0.01 and used the same 
initialisation as in the first experiment. Example trials are shown in Fig. [3] Despite the increase 
in complexity of the data (higher dimensionality, more dynamical systems), the model switched to 
the true DMP within the first second in most cases and overall achieved 100% correct responses (cf. 
Table[TJ. 

In addition, the inference is robust against noise: We added Gaussian noise directly on the marker 
positions with standard deviation equal to that of the marker positions (r\ y € [2 • 10~ 5 , 8.53]) which, 
in velocities, translated to 10 times larger noise than signal. Yet, with r\ y set to the noise standard 
deviation, the model still gave ca. 90% correct responses (cf. Table [T|. Furthermore, as before, 
the model was robust to misspecification of the prior uncertainty r/ y (see Table [TJi suggesting that 
it is easy to apply the present model to real-world data sets for which the exact amount of noise is 
unknown. 

6 Discussion 

We have presented a model which switches nonlinear differential equations and have shown that, 
given observations, filtering using the unscented Kalman filter is sufficient to discriminate which 
dynamic process produced the observations, in an online fashion and by a simple arg-max readout 
of the inferred hidden variables. We illustrated the performance of the model using synthetic motion 
in a plane and realistic point light walker dynamics. By representing a switching process with a 
Hopfield network with random perturbations, the model achieves a trade-off between stable tracking 
of a dynamic process in the environment and faithful switching between these processes. Inference 
performed by the model was robust against noise and model misspecification. 

By using DMPs the model can represent arbitrary dynamic processes. Although we have only 
shown applications to rhythmic motion, DMPs can also be used to represent goal-directed motion 
J3] |H which we expect to work equally well with the present switching model. We found that the 
dynamics implemented by DMPs, i.e., a strongly attracting limit cycle, is important for successful 
inference of the switching variables. In preliminary tests we found that when using more flexible 
dynamic representations, such as standard recurrent neural networks with random connectivity, the 
model can explain variation due to one of the other dynamical systems by small adaptations of the 
state of the current dynamical system, as opposed to switching the dynamical system. This is not 
possible with DMPs, which generate only a single trajectory along a limit cycle. When greater 
flexibility of the representation is needed, the attractiveness of the limit cycle could be adjusted in 
the model (parameter a in Section H). More importantly, we believe that other parametric models 
may be used together with the switching dynamics to discriminate potentially more complicated 
motions, e.g., a hierarchical model which smoothly combines DMPs to represent a long sequence of 
motions. 

The unscented Kalman filter only tracks a single mode of a multimodal posterior. This approxima- 
tion can impact the discrimination performance of the model. Indeed, we found that the switching 
of the model during inference on the motion capture data was sometimes delayed (cf. Fig. [3] and 
Table [TJ which may be attributed to this limitation. Nevertheless, we found that the discrimination 
performance was still robust against random (Gaussian) perturbations which were larger than the 
actual changes induced by the motion. In particular, the increasing fraction of correct responses 
with time (cf. Table [TJ indicates that, once found, the correct posterior mode was, even under large 
perturbations, mostly stable in our model. 

The performance of the model may be further improved by using inference methods which can 
approximate the full posterior distribution such as particle filtering. However, this would be bought 
with an increase in computational demands which might be unnecessary for a given set of dynamic 
processes, as exemplarily shown here. 



8 



References 

[1] K. Doya, S. Ishii, A. Pouget, and R. P. N. Rao, eds., Bayesian Brain. MIT Press, 2007. 

[2] J. J. Hopfield, "Neurons with graded response have collective computational properties like those of two- 
state neurons.," Proc Natl Acad Sci USA, vol. 81, pp. 3088-3092, May 1984. 

[3] A. J. Ijspeert, J. Nakanishi, and S. Schaal, "Learning attractor landscapes for learning motor primitives," 
in Advances in Neural Information Processing Systems 15, pp. 1523-1530, Cambridge, MA: MIT Press, 
2003. 

[4] S. Schaal, P. Mohajerian, and A. Ijspeert, "Dynamics systems vs. optimal control - a unifying view," 
in Computational Neuroscience: Theoretical Insights into Brain Function (P. Cisek, T. Drew, and J. F. 
Kalaska, eds.), vol. 165 of Progress in Brain Research, pp. 425 - 445, Elsevier, 2007. 

[5] S. Friihwirth-Schnatter, Finite Mixture and Markov Switching Models. Springer, 2006. 

[6] R. Chen and J. S. Liu, "Mixture kalman filters," Journal of the Royal Statistical Society: Series B (Statis- 
tical Methodology), vol. 62, no. 3, pp. 493-508, 2000. 

[7] Z. Chen, "Bayesian filtering: From kalman filters to particle filters, and beyond," tech. rep., Adaptive 
Syst. Lab., McMaster Univ., Hamilton, Canada, 2003. 

[8] Z. Ghahramani and G. E. Hinton, "Variational learning for switching state-space models.," Neural Corn- 
put, vol. 12, pp. 831-864, Apr 2000. 

[9] K. P. Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, Uni- 
versity of California, Berkeley, 2002. 

[10] J. A. Quinn, C. K. Williams, and N. Mcintosh, "Factorial switching linear dynamical systems applied to 
physiological condition monitoring," IEEE Transactions on Pattern Analysis and Machine Intelligence, 
vol. 31, pp. 1537-1551,2009. 

[11] R. Garnett, M. A. Osborne, S. Reece, A. Rogers, and S. J. Roberts, "Sequential bayesian prediction in the 
presence of changepoints and faults," The Computer Journal, vol. 53, no. 9, pp. 1430-1446, 2010. 

[12] Y. Saatci, R. Turner, and C. E. Rasmussen, "Gaussian process change point models," in Proceedings of 
the 27th International Conference on Machine Learning (ICML-10) (J. Fiirnkranz and T. Joachims, eds.), 
(Haifa, Israel), pp. 927-934, Omnipress, June 2010. 

[13] R. P. Adams and D. J. MacKay, "Bayesian online changepoint detection," tech. rep., Cambridge, UK, 
2007. 

[14] P. Fearnhead and Z. Liu, "On-line inference for multiple changepoint problems," Journal of the Royal 
Statistical Society: Series B (Statistical Methodology), vol. 69, no. 4, pp. 589-605, 2007. 

[15] M. Alvarez, J. Peters, B. Schoelkopf, and N. Lawrence, "Switched latent force models for movement 
segmentation," in Advances in Neural Information Processing Systems 23, pp. 55-63, 2010. 

[16] M. Opper, A. Ruttor, and G. Sanguinetti, "Approximate inference in continuous time gaussian-jump pro- 
cesses," in Advances in Neural Information Processing Systems 23, pp. 1831-1839, 2010. 

[17] F. Stimberg, M. Opper, G. Sanguinetti, and A. Ruttor, "Inference in continuous-time change-point mod- 
els," in Advances in Neural Information Processing Systems 24, pp. 2717-2725, 201 1. 

[18] F. Stimberg, A. Ruttor, and M. Opper, "Bayesian inference for change points in dynamical systems with 
reusable states - a Chinese restaurant process approach," in JMLR W&CP 22, pp. 1117-1124, 2012. 

[19] A. Gams, A. Ijspeert, S. Schaal, and J. Lenarcic, "On-line learning and modulation of periodic movements 
with nonlinear dynamical systems," Autonomous Robots, vol. 27, pp. 3-23, 2009. 10.1007/sl0514-009- 
9118-y. 

[20] A. H. Jazwinski, Stochastic Processes and Filtering Theory. Academic Press, 1970. 

[21] S. Julier, J. Uhlmann, and H. Durrant-Whyte, "A new approach for filtering nonlinear systems," in Amer- 
ican Control Conference, 1995. Proceedings of the, vol. 3, pp. 1628 -1632 vol.3, jun 1995. 

[22] E. A. Wan and R. van der Merwe, "The unscented kalman filter," in Kalman Filtering and Neural Net- 
works (S. Haykin, ed.), John Wiley & Sons, Inc., 2001. 

[23] A. Doucet, N. de Freitas, and N. Gordon, eds., Sequential Monte Carlo Methods in Practice. Springer, 
2001. 

[24] S. Sarkka, "On unscented kalman filtering for state estimation of continuous-time nonlinear systems," 
Automatic Control, IEEE Transactions on, vol. 52, pp. 1631 -1641, sept. 2007. 



9 



